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ABSTRACT 

A s trategy is proposed for combining scores from 
multiple-choice achievement measures with performance assessments. 
The specific situation discussed involves the revision of a 
curriculum-based multiple-choice and performance assessment testing 
program for grades 1 through 6 for a large school district, Reading, 
language-arts, and mathematics achievement will be assessed at each 
grade level using multiple-choice tests and at least one performance 
assessment. Curriculum and assessments were developed locally by 
teacher task groups. Approximately 48,000 examinees will respond to 
the multiple-choice tests and complete at least one performance 
assessment. A scaling test composed of a sample of items from each 
grade will be constructed and administered to about 300 examinees at 
each grade level. Multiple-choice tests will be machine scored and 
about 20 Yo of the performance assessments at each grade level will be 
scored by trained scorers. The process to develop scaled scores over 
grade levels and to combine the scores is described in detail. The 
procedure will attempt to combine common-anchor test design and 
scaling-test methods. (Contains 10 references.) (SLD) 
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Introduction 



This is a concept paper which proposes a strategy for combining 
scores for multiple-choice achievement measures with performance 
assessments . The specific situation involves the revision of a 
curriculum based multiple-choice and performance assessment 
testing program for grades 1 through 6 for a large school 
district. Reading, language arts, and mathematics achievement 
will be assessed at each grade level using 30-45 item multiple- 
choice tests and at least one performance assessment in the 
areas of reading and mathematics. 

The curriculum, multiple-choice items, and performance 
assessments were all locally developed by teacher task groups 
with support from curriculum and measurement consultants. The 
assessment development program has recently finished a large 
scale field testing for the pools of multiple-choice items and 
performance assessments and is in the process of developing 
'final 1 multiple-choice test forms, revised performance 
assessments , and scoring rubrics for May administration and 
scoring. 

The multiple-choice tests are designed to have an overlap of fiv 
to seven common-anchor-items between grade levels to allow 
scaling to achieve comparable scores within content areas. 
School dis trict administrators wish to have a scaled score in 



each content area over the six grade levels. Additionally, they 
want to combine the multiple-choice results with the performance 
results for each content area in such a manner as to weight the 
performance assessment as one-forth the composite. The following 
presents a proposed strategy to accomplish the scaling of the 
multiple-choice measures and combine the scaled multiple-choice 
scores with the performance assessment scores. 

Method 

There will be approximately forty-eight thousand examinees in 
grades one through six responding to multiple-choice tests in 
reading, languate arts, and mathematics. Additionally, examinees 
will be administered at least one reading performance assessment 
and one mathematics performance assessment at each grade level. 
A scaling test composed of a sampling of items from each grade 
level test will be constructed and administered to approximately 
300 examinees at each grade level. The multiple-choice tests 
will be machine scored and a sample of approximately 20% of the 
performance assessments at each grade level will be scored by a 
task group of trained scorers providing at least two ratings per 
response . 

The multiple choice test results will be analyzed using either a 
polytomous scoring model [e.g., max-alpha (Guttman, 1941) or 
polyweighting (Sympson, 1983;. 1986; 1988)] or an item response 



theory IRT model [e.g., three parameter model (Lord & Novick, 
1968)]. The BILOG computer program (Mislevy & Bock, 1990) allows 
test form equating using common anchor or linking items between 
forms of the test for the 1, 2, and 3 parameter item response 
theory models. Sympson (1990) developed a program employing a 
polytomous scoring model which allow equating among test forms 
with overlapping item sets. Sympson ! s model is similar to 
Guttman's (1941) polytomous scoring strategy which provides an 
optimization of coefficient alpha and, therefore, is often 
referred to as max-alpha. Max-alpha uses the concept of option 
mean, the mean of total test score for all examinees choosing an 
option. Sympson f s polyweighting replaces option mean with the 
mean percentile rank of examinees selecting each option. 

Performance assessments will be scored using a scoring model 
which partitions score variance into facets (Linacre, 1 989 ) , which 
allows removal of variability for raters and prompts in the 
examinees score distribution. 

The following presents the plan to develop scaled scores over 
grade levels and combine performance assessment ratings with the 
scaled scores. The procedures described will employ cross 
validation. 
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Steps in the process: 



1 . Using either an IRT or classical polytomous model, determine 
scaled scores over the grade levels for the multiple-choice 
tests within each content area. The procedure will attempt 
to combine common-item-anchor-test design (Angoff, 1984) and 
scaling test methods (Petersen, Kolen, & Hoover, 1989). 
Mittman (1958) found less grade-to-grade overlap for the 
scaling test method than for the anchor item method. Both 
methods will be used and combined to maximize the likelihood 
of developing an educationally relevant score scale . 

2. Next, convert each grade level's distribution of scaled 
scores on the multiple-choice tests to percentile ranks. 

3. Determine the examinee ability logits for the performance 
assessments using Facets. 

4. Determine the percentile ranks of the performance assessment 
logits within grade levels. 

5. Equate performance assessments logits to multiple-choice 
scale scores using equipercentile equating . 




6, Score each ,examinee by determining the linear combination of 
the multiple-choice scale score and performance assessment 
score using appropriate weighting. 

7, Convert the combined score to a score scale with desirable 
characteristics . 

8, Convert combined scaled scores to percentile ranks within 
grades . 

Conclusion 

The primary focus of the curriculum and test revision project is 
to improve instruction and learnii g and the primary 
interpretations of test results will be curriculum (or criterion) 
based. The scaled scores, therefore, will not be an initial • 
focus of test interpretation. It is most likely that, if the 
scaling is judged to be somewhat successful, an effort to improve 
the tests and scaled scores will follow. 
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